Modeling Topics

نویسنده

  • Kevin Gimpel
چکیده

Many applications in machine learning, natural language processing, and information retrieval require methods for representing and computing with text documents. In this review, we discuss techniques that use latent, topical information in text documents for solving problems in these fields. We consider approaches to problems such as document retrieval, topic tracking, novel event detection, document classification, and language modeling. In doing so, we provide snapshots of the evolution of topic-oriented techniques during the 1990s and settle our discussion on more recent work in probabilistic topic modeling. The current paradigm, characterized by the latent Dirichlet allocation (LDA) model [Blei et al., 2003], consists of probabilistic document modeling in which topics are expressed as hidden random variables. We highlight connections among the literature through the years and discuss possible directions of future work.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Latent Dirichlet Allocation with Topic-in-Set Knowledge

Latent Dirichlet Allocation is an unsupervised graphical model which can discover latent topics in unlabeled data. We propose a mechanism for adding partial supervision, called topic-in-set knowledge, to latent topic modeling. This type of supervision can be used to encourage the recovery of topics which are more relevant to user modeling goals than the topics which would be recovered otherwise...

متن کامل

Analysis of Computational Science Papers from ICCS 2001-2016 using Topic Modeling and Graph Theory

This paper presents results of topic modeling and network models of topics using the ICCS corpus, which contains domain-specific (computational science) papers over sixteen years (a total of 5695 papers). We discuss topical structures of ICCS, how these topics evolve over time in response to the topicality of various problems, technologies and methods, and how all these topics relate to one ano...

متن کامل

A Knowledge-based Topic Modeling Approach for Automatic Topic Labeling

Probabilistic topic models, which aim to discover latent topics in text corpora define each document as a multinomial distributions over topics and each topic as a multinomial distributions over words. Although, humans can infer a proper label for each topic by looking at top representative words of the topic but, it is not applicable for machines. Automatic Topic Labeling techniques try to add...

متن کامل

Integrating Document Clustering and Topic Modeling

Document clustering and topic modeling are two closely related tasks which can mutually benefit each other. Topic modeling can project documents into a topic space which facilitates effective document clustering. Cluster labels discovered by document clustering can be incorporated into topic models to extract local topics specific to each cluster and global topics shared by all clusters. In thi...

متن کامل

Topics in Brain Signal Processing

This brief paper provides an introduction to the area of brain signal processing, and also serves as an introductory presentation for the special session entitled Advanced Signal Processing of Brain Signals: Methods and Applications at APSIPA 2010. Several topics related to the processing of brain signals are discussed: preprocessing, inverse modeling (a.k.a. source modeling), and signal decodi...

متن کامل

Skew-slash distribution and its application in topics regression

In many issues of statistical modeling, the common assumption is that observations are normally distributed. In many real data applications, however, the true distribution is deviated from the normal. Thus, the main concern of most recent studies on analyzing data is to construct and the use of alternative distributions. In this regard, new classes of distributions such as slash and skew-sla...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006